TritonGPUConfig()

Generates a configuration for running Towhee pipelines with a Triton inference server on GPU devices. See Towhee Pipeline in Triton for details.

TritonGPUConfig(device_ids=[0], max_batch_size=None, batch_latency_micros=None, preferred_batch_size=None)

Parameters

device_ids - list[int]
- A list of GPU IDs.
- The value defaults to [0], indicating that only GPU 0 is used.
max_batch_size - int or None
- A maximum batch size that the model in the pipeline supports for the types of batching that can be exploited by Triton. See Maximum Batch Size for details.
- The value defaults to None, leaving Triton to generate the value.
batch_latency_micros - int or None
- Latency for Triton to process the delivered batch, in microseconds.
- The value defaults to None, leaving Triton to generate the value.
preferred_batch_size - list[int] None
- A list of batch sizes that the Triton should attempt to create.
- The value defaults to None, leaving Triton to generate the value.

Returns

A TowheeConfig object with server set to a dictionary. The dictionary contains the specified parameters and their values with device_ids set to None.

Example

from towhee import pipe, ops, AutoConfig

auto_config1 = AutoConfig.TritonGPUConfig()
auto_config1.config # return {'server': {'device_ids': [0], 'num_instances_per_device': 1, 'max_batch_size': None, 'batch_latency_micros': None, 'triton': {'preferred_batch_size': None}}}

# or you can also set the configuration
auto_config2 = AutoConfig.TritonGPUConfig(device_ids=[0],
                                     num_instances_per_device=3,
                                     max_batch_size=128,
                                     batch_latency_micros=100000,
                                     preferred_batch_size=[8, 16])
auto_config2.config # return {'server': {'device_ids': [0], 'num_instances_per_device': 3, 'max_batch_size': 128, 'batch_latency_micros': 100000, 'triton': {'preferred_batch_size': [8, 16]}}}

# you can also add the configuration
auto_config3 = AutoConfig.LocalGPUConfig() + AutoConfig.TritonGPUConfig()
auto_config3.config # return {'device': 0, 'server': {'device_ids': [0], 'num_instances_per_device': 1, 'max_batch_size': None, 'batch_latency_micros': None, 'triton': {'preferred_batch_size': None}}}

Parameters​

Returns​

Example​

Parameters

Returns

Example